Learning Purposeful Behaviour in the Absence of Rewards
نویسندگان
چکیده
Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are “just out of reach” of the agents current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the agent to visit different parts of the state space. Moreover, the approach is particularly suited for settings where rewards are very sparse, and such behaviours can help in the exploration of the environment until reward is observed.
منابع مشابه
The mediating role of organizational purposeful forgetting in the influence of genuine leadership on organizational learning in Staff of Ministry of Petroleum
The purpose of this study was to investigate the effect of genuine leadership on organizational learning with regard to mediating variable of purposeful organizational forgetting. The methodology of study was applied in terms of purpose and descriptive-correlative in terms of implementation. The statistical population of the study consisted of 840 employees of headquarter of the Ministry of P...
متن کاملThe Necessity of Average Rewards in Cooperative Multirobot Learning
Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular singlerobot learning algorithms based on discounted rewards, such as Q learning, do not achieve cooperation (i.e., purposeful division of labor) when applied to task-level multirobot systems. A tasklevel system is defined as one performing a mission that is decompo...
متن کاملکاهش ارزش تأخیری و همبستگی آن با چشم انداز زمان در کارورزان رشته پزشکی
AbstractIntroduction: Delay discounting (DD) means prefering small immediate rewards to large delayed rewards. This study was to assess delay discounting and the correlation of our findings with that of the Zimbardo Time Perspective Inventory (ZTPI).Method: In a cross-sectional study, DD and time perspective were investigated in 93 medical interns by means of a computer software and ZTPI. In d...
متن کاملبررسی مقایسه ای تاثیر کارآموزی بالینی اصول و فنون به روش ایفای نقش و روش سنتی بر رفتارهای مراقبتی دانشجویان پرستاری
Background and Aim: Caring is a multidimensional nursing concept that can be actualized within the baccalaureate nursing curriculum through the purposeful teaching and student centered learning of core values. Although, the learning of caring is widely accepted, it has not been proved through research. The aim of this study was to assess and compare the effectiveness of clinical practice of...
متن کاملReinforcement Learning in Biologically-Inspired Collective Robotics: A Rough Set Approach
This thesis presents a rough set approach to reinforcement learning. This is made possible by considering behaviour patterns of learning agents in the context of approximation spaces. Rough set theory introduced by Zdzisław Pawlak in the early 1980s provides a ground for deriving pattern-based rewards within approximation spaces. Learning can be considered episodic. The framework provided by an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1605.07700 شماره
صفحات -
تاریخ انتشار 2016